Improved HMM Separation for Distant-Talking Speech Recognition
نویسندگان
چکیده
In distant-talking speech recognition, the recognition accuracy is seriously degraded by reverberation and environmental noise. A robust speech recognition technique in such environments, HMM separation and composition, has been described in [1]. HMM separation estimates the model parameters of the acoustic transfer function using adaptation data uttered from an unknown position in noisy and reverberant environments, and HMM composition builds an HMM of noisy and reverberant speech, using the acoustic transfer function estimated by HMM separation. Previously, HMM separation has been applied to the acoustic transfer function based on a single Gaussian distribution. However the improvement was smaller than expected for the impulse response with long reverberations. This is because the variance of the acoustic transfer function in each frame increases, since the length of the impulse response of the room reverberation is longer than that of the spectral analysis window. In this paper, HMM separation is extended to estimate the acoustic transfer function based on the Gaussian mixture components in order to compensate for the greater variability of the acoustic transfer function, and the re-estimation formulae are derived. In addition, this paper introduces a technique to adapt the noise weight for each mel-spaced frequency in order to improve the performance of the HMM separation in the linear-spectral domain, since the use of the HMM separation in the linear-spectral domain sometimes causes a negative mean output due to the subtraction operation. The extended HMM separation is evaluated on distant-talking speech recognition tasks. The results of the experiments clarify the effectiveness of the proposed method. key words: distant-talking speech recognition, HMM separation, reverberation, noise
منابع مشابه
HMM-separation-based speech recognition for a distant moving speaker
This paper presents a hands-free speech recognition method based on HMM composition and separation for speech contaminated not only by additive noise but also by an acoustic transfer function. The method realizes an improved user interface such that a user is not encumbered by microphone equipment in noisy and reverberant environments. The use of HMM composition has already been proposed for co...
متن کاملSpeech recognition in a reverberant environment using matched filter array (MFA) processing and linguistic-tree maximum likelihood linear regression (LT-MLLR) adaptation
Performance of automatic speech recognition systems trained on close talking data su ers when used in a distant talking environment due to the mismatch in training and testing conditions Microphone array sound capture can reduce some mismatch by removing ambi ent noise and reverberation but o ers insu cient im provement in performance However using array sig nal capture in conjunction with Hidd...
متن کاملSpeech recognition for a distant moving speaker based on HMM composition and separation
This paper describes a hands-free speech recognition method based on HMM composition and separation for speech contaminated not only by additive noise but also by an acoustic transfer function. The method re alizes an improved user interface such that a user is not encumbered by microphone equipment in noisy and re verberant environments. In this approach, an attempt is made to model acoustic...
متن کاملAn evaluation of adaptive beamformer based on average speech spectrum for noisy speech recognition
Distant-talking speech recognition in noisy environments is indispensable for self-moving robots or tele-conference systems. However, background noise and room reverberations seriously degrade the sound-capture quality in real acoustic environments. A microphone array is an ideal candidate as an effective method for capturing distant-talking speech. AMNOR (Adaptive Microphone-array for NOise Re...
متن کاملMatching the Acoustic Model to Front-End Signal Processing for ASR in Noisy and Reverberant Environments
Distant-talking automatic speech recognition (ASR) represents an extremely challenging task. The major reason is that unwanted additive interference and reverberation are picked up by the microphones besides the desired signal. A hands-free human-machine interface should therefore comprise a powerful acoustic preprocessing unit in line with a robust ASR back-end. However, since perfect speech e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEICE Transactions
دوره 87-D شماره
صفحات -
تاریخ انتشار 2004